16 research outputs found

    Complementing Brightness Constancy with Deep Networks for Optical Flow Prediction

    Full text link
    State-of-the-art methods for optical flow estimation rely on deep learning, which require complex sequential training schemes to reach optimal performances on real-world data. In this work, we introduce the COMBO deep network that explicitly exploits the brightness constancy (BC) model used in traditional methods. Since BC is an approximate physical model violated in several situations, we propose to train a physically-constrained network complemented with a data-driven network. We introduce a unique and meaningful flow decomposition between the physical prior and the data-driven complement, including an uncertainty quantification of the BC model. We derive a joint training scheme for learning the different components of the decomposition ensuring an optimal cooperation, in a supervised but also in a semi-supervised context. Experiments show that COMBO can improve performances over state-of-the-art supervised networks, e.g. RAFT, reaching state-of-the-art results on several benchmarks. We highlight how COMBO can leverage the BC model and adapt to its limitations. Finally, we show that our semi-supervised method can significantly simplify the training procedure

    Hybrid Energy Based Model in the Feature Space for Out-of-Distribution Detection

    Full text link
    Out-of-distribution (OOD) detection is a critical requirement for the deployment of deep neural networks. This paper introduces the HEAT model, a new post-hoc OOD detection method estimating the density of in-distribution (ID) samples using hybrid energy-based models (EBM) in the feature space of a pre-trained backbone. HEAT complements prior density estimators of the ID density, e.g. parametric models like the Gaussian Mixture Model (GMM), to provide an accurate yet robust density estimation. A second contribution is to leverage the EBM framework to provide a unified density estimation and to compose several energy terms. Extensive experiments demonstrate the significance of the two contributions. HEAT sets new state-of-the-art OOD detection results on the CIFAR-10 / CIFAR-100 benchmark as well as on the large-scale Imagenet benchmark. The code is available at: https://github.com/MarcLafon/heatood

    Hierarchical Average Precision Training for Pertinent Image Retrieval

    Full text link
    Image Retrieval is commonly evaluated with Average Precision (AP) or Recall@k. Yet, those metrics, are limited to binary labels and do not take into account errors' severity. This paper introduces a new hierarchical AP training method for pertinent image retrieval (HAP-PIER). HAPPIER is based on a new H-AP metric, which leverages a concept hierarchy to refine AP by integrating errors' importance and better evaluate rankings. To train deep models with H-AP, we carefully study the problem's structure and design a smooth lower bound surrogate combined with a clustering loss that ensures consistent ordering. Extensive experiments on 6 datasets show that HAPPIER significantly outperforms state-of-the-art methods for hierarchical retrieval, while being on par with the latest approaches when evaluating fine-grained ranking performances. Finally, we show that HAPPIER leads to better organization of the embedding space, and prevents most severe failure cases of non-hierarchical methods. Our code is publicly available at: https://github.com/elias-ramzi/HAPPIER

    Complementing Brightness Constancy with Deep Networks for Optical Flow Prediction

    No full text
    International audienceState-of-the-art methods for optical flow estimation rely on deep learning, which require complex sequential training schemes to reach optimal performances on real-world data. In this work, we introduce the COMBO deep network that explicitly exploits the brightness constancy (BC) model used in traditional methods. Since BC is an approximate physical model violated in several situations, we propose to train a physically-constrained network complemented with a data-driven network. We introduce a unique and meaningful flow decomposition between the physical prior and the data-driven complement, including an uncertainty quantification of the BC model. We derive a joint training scheme for learning the different components of the decomposition ensuring an optimal cooperation, in a supervised but also in a semi-supervised context. Experiments show that COMBO can improve performances over state-of-the-art supervised networks, e.g. RAFT, reaching state-of-theart results on several benchmarks. We highlight how COMBO can leverage the BC model and adapt to its limitations. Finally, we show that our semi-supervised method can significantly simplify the training procedure

    Complementing Brightness Constancy with Deep Networks for Optical Flow Prediction

    No full text
    International audienceState-of-the-art methods for optical flow estimation rely on deep learning, which require complex sequential training schemes to reach optimal performances on real-world data. In this work, we introduce the COMBO deep network that explicitly exploits the brightness constancy (BC) model used in traditional methods. Since BC is an approximate physical model violated in several situations, we propose to train a physically-constrained network complemented with a data-driven network. We introduce a unique and meaningful flow decomposition between the physical prior and the data-driven complement, including an uncertainty quantification of the BC model. We derive a joint training scheme for learning the different components of the decomposition ensuring an optimal cooperation, in a supervised but also in a semi-supervised context. Experiments show that COMBO can improve performances over state-of-the-art supervised networks, e.g. RAFT, reaching state-of-theart results on several benchmarks. We highlight how COMBO can leverage the BC model and adapt to its limitations. Finally, we show that our semi-supervised method can significantly simplify the training procedure

    Now you see me: finding the right observation space to learn diverse behaviours by reinforcement in games

    No full text
    National audienceTraining virtual agents to play a game using reinforcement learning (RL) has gained a lot of traction in recent years. Indeed, RL has delivered agents with superhuman performances on multiple gameplays. Yet, from a human-machine interaction standpoint, raw performance is not the only dimension of a "good" game AI. Exhibiting diversified behaviours is key to generate novelty, one of the core components of player engagement. In the RL framework, teaching agents to discover multiple strategies to achieve the same task is often framed as skill discovery. However, we observe that the current RL literature defines diversity as the exploration of different states, i.e. the incentive of the agent to "see" new observations. In this work, we argue that this definition does not make sense from a gameplay point of view. Instead, diversity should be defined as a distance on observations from an observer, external to the agent. We illustrate how DIAYN/SMERL, state of the art RL algorithms for skill discovery, fail to discover meaningful behaviours in a simple tag game. We propose an easy fix by introducing the notion of diversity spaces, defined as the observations gathered by a third-party external to the agent

    Introducing spatial regularization in SAR tomography reconstruction

    Get PDF
    International audienceThe resolution achieved by current Synthetic Aperture Radar (SAR) sensors provides detailed visualization of urban areas. Spaceborne sensors such as TerraSAR-X can be used to analyze large areas at a very high resolution. In addition, repeated passes of the satellite give access to temporal and interferometric information on the scene. Because of the complex 3-D structure of urban surfaces, scatterers located at different heights (ground, building façade, roof) produce radar echoes that often get mixed within the same radar cells. These echoes must be numerically unmixed in order to get a fine understanding of the radar images. This unmixing is at the core of SAR tomography. SAR tomography reconstruction is generally performed in two steps: (i) reconstruction of the so-called tomogram by vertical focusing, at each radar resolution cell, to extract the complex amplitudes (a 1-D processing); (ii) transformation from radar geometry to ground geometry and extraction of significant scat-terers. We propose to perform the tomographic inversion directly in ground geometry in order to enforce spatial regularity in 3-D space. This inversion requires solving a large-scale non-convex optimization problem. We describe an iterative method based on variable splitting and the augmented Lagrangian technique. Spatial regularizations can easily be included in this generic scheme. We illustrate on simulated data and a TerraSAR-X tomographic dataset the potential of this approach to produce 3-D reconstructions of urban surfaces.La résolution atteignable par les Radar à Synthèse d'Ouverture (RSO ou SAR) actuelle permet la visualisation détaillée de milieu urbains. Les capteurs embarqués sur satellites comme TerraSAR-X peuvent être utilisés pour analyser avec une très haute résolution de larges zones à la surface terrestre. Les structures complexes présentes dans les milieu urbains possèdent généralement plusieurs réflecteurs à des hauteurs différentes (sol, façades de bâtiment, toits, etc). Ceux-ci génèrent des échos se projetant dans les même cases radar qui doivent être séparés pour permettre une compréhension fine des images radar. Ce travail de séparation de sources constitue le coeur de la tomographie SAR. La tomographie SAR est généralement effectuée en deux étapes : (i) construction des dits tomogrammes par séparation verticale en chaque case radar (traitement 1-D), (ii) transformation pour passer d'une géométrie radar à une géométrie sol et extraction des réflecteurs significatifs. Nous proposons d'effectuer l'inversion tomographique directement en géométrie sol pour favoriser une régularité dans l'espace 3-D. Cette inversion requière de résoudre un problème d'optimisation non convexe de large échelle. Pour ce faire, nous utilisons une méthode itérative basée sur une séparation de variables et l'utilisation du Lagrangien augmenté. Nous validons notre algorithme et présentons des reconstructions obtenues à partir de données simulées ainsi que d'un jeu d'image TerraSAR-X

    Memory transformers for full context and high-resolution 3D Medical Segmentation

    No full text
    International audienceTransformer models achieve state-of-the-art results for image segmentation. However, achieving long-range attention, necessary to capture global context, with high-resolution 3D images is a fundamental challenge. This paper introduces the Full resolutIoN mEmory (FINE) transformer to overcome this issue. The core idea behind FINE is to learn memory tokens to indirectly model full range interactions while scaling well in both memory and computational costs. FINE introduces memory tokens at two levels: the first one allows full interaction between voxels within local image regions (patches), the second one allows full interactions between all regions of the 3D volume. Combined, they allow full attention over high resolution images, e.g. 512 x 512 x 256 voxels and above. Experiments on the BCV image segmentation dataset shows better performances than state-of-the-art CNN and transformer baselines, highlighting the superiority of our full attention mechanism compared to recent transformer baselines, e.g. CoTr, and nnFormer
    corecore